home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Power Programmierung
/
Power-Programmierung CD 2 (Tewi)(1994).iso
/
doc
/
mir
/
03why
< prev
next >
Wrap
Text File
|
1992-06-29
|
10KB
|
201 lines
═══════════════════════════════════════════════
3. WHY YOU WANT TO USE MIR TUTORIALS
═══════════════════════════════════════════════
════════════════════════════════
3.1 The end user and the
thirst for knowledge
════════════════════════════════
People need to know. To know is to have understanding.
To know is to recognize the nature of something going on in our
world. For each of us, knowing is the key to control over our
environment. To know is to gain self-esteem and confidence.
Knowledge equips a person to create value. And creating or adding
value is what our working life is about. Knowledge is always the
first step.
The objective of computerized indexing and retrieval is
to serve people's need for knowledge. The objective is NOT tidy
techniques; it is service and empowerment. Efficient techniques
are simply a means to the end. For those who insist that we focus
on profit and the bottom line, consider this: If we keep improving
in our recognition of human need and our service of that need (and
if we don't "park our brains at the door" in the process), that is
the surest way to ongoing profit.
Everything that follows seeks to give control to the
end user. Knowledge itself increases a person's control over his
or her world. The tools that we put in the hands of people
searching for information should likewise increase (rather than
diminish) control. Every element of design and technique in the
MIR project starts with user needs. In simple terms, people
matter. If that sounds like a plea for market-oriented technology,
yes, it is!
═════════════════════════════════
3.2 Coping with data glut
═════════════════════════════════
People need to know. But facts, or data, are not in
themselves knowledge. Facts are like jigsaw puzzle pieces. We
must have the pieces, or the puzzle will not come together. And we
don't want to miss any relevant facts. But there are too many
facts... jigsaw puzzle pieces... that don't contribute to our
specific aim at any one point in time. Piling on more and more
facts does not necessarily lead to knowledge. Data without
recognizable patterns is noise. Noise leads to stress and loss of
function. If there is a feeling of being swamped with data,
finding desired patterns is all that much harder.
Change has become the norm, change driven by forces
such as the proliferation of new products, government regulation,
social and technical complexity, communication improvements,
customer autonomy, fragmenting markets, and so forth. One notable
result: our world is awash in a sea of data. Organizations have to
keep track of far more details than ever before. Consider your
employer as an example, or any government department with which you
are familiar; how much more data is kept today than ten years ago?
With few exceptions, you find that there is an exponential
explosion of data kept, data required, and information to be
retrieved.
Numbers of databases are growing. So is the size of
the typical accumulation of data. This is illustrated by what
happened in the CD-ROM industry. When compact optical discs were
first used for storing computer data in 1985, people wondered how
a disc with a capacity of more than half a gigabyte could ever be
fully used. Now it is common for a single database to span several
CD-ROMs. The cost of new storage technology for personal computers
is dropping fast. More storage means more data, and that in turns
means an increasing need for quality search capability.
════════════════════════════
3.3 Empowering users
════════════════════════════
"I want what I want when I want it." True for
executives. True for two year olds. And, if we care to admit it,
true for ourselves when we are searching for information.
Anything can be found, if one has forever to find it.
But the average person hasn't got forever. And the time that is
available is too precious to be used staring at a "searching
database..." message on a non-responsive computer screen. Now even
the most amateur retrieval system finds things fast within a small
sample (which explains why so many sales demonstrations are done
with small samples). More sophisticated textbook indexing methods
have acceptable levels of delay for 20,000 (and sometimes even as
high as 100,000) records. But today's databases very often exceed
these limits. So there has been a shakeout among computer methods
of indexing and retrieving information. Only the more powerful
techniques of indexing and retrieval can compete on gigabyte-sized
tasks.
The primary need is to place a high value on people's
time. (So many managers miss this simple truth.)
A second basic need is simplicity. This derives from
the need to value the user's time. People do not want to invest
time in reading manuals and learning complex systems. Ideally, the
searcher should be able to re-use a familiar and preferred search
and retrieval system on any new set of data that comes to hand.
Maximum gain; minimum pain.
The third need is access. People are empowered to find
information as timely data is made available to them at reasonable
cost.
══════════════════════════════════
3.4 Empowering an industry
══════════════════════════════════
Over the past 30 years, an entire industry has grown up
around the requirement to equip persons and/or organizations to
extract useful information simply and quickly from quantities of
data. The industry launched itself primarily from government data
which was distributed on paper, microfilm, microfiche, punched
tape, and eventually magnetic tape. By the end of the 1960s, the
industry was experimenting with on-line electronic information
services. It was the advent of personal computers and optical
discs in the early 1980s that made possible information services at
dramatically lowered costs. Electronic search split quickly into
on-line for the most current data and CD-ROM for historical data.
The lowest cost medium has been CD-ROM, which offers the potential
for random access across more than 600 million bytes in under two
seconds.
The ongoing needs of this industry have to do with
development costs, the vast array of formats in which data is
received, and disarray with respect to standards.
Development time and costs are much too high because
all the better indexing systems have been proprietary. MIR
Tutorials and software aim to make world class search and retrieval
systems available to the public under the Free Software Foundation
"copyleft" rules. Firms may adapt MIR source code for their
commercial purposes without payment of license fees or royalties.
Costs are also reduced as firms take advantage of automated
indexing techniques.
Variability in format of data to be indexed may
diminish in the long term, but for at least the remainder of the
twentieth century it will continue to be a problem. We address the
problem here by offering techniques to cover a wide range of cases.
Standards are an issue because the end user is often
forced to learn a new retrieval system when access to a new
database is acquired. The current standard for CD-ROM data on
compact discs is ISO-9660. This governs how file locations on a
compact disc are listed in a directory, but has no bearing
whatsoever on the actual content of an index file. Other standards
have been developed, for example, Office Document Architecture
(ODA), Standard Generalized Markup Language (SGML), etc. These are
helpful, but neither do they make it possible to search in uniform
ways across totally different databases. CD-RDx and SFQL
(Structured Full-Text Query Language) each propose to deal with the
problem by engine-independent techniques in which the software
dealing with proprietary indexes is separated from the software
experienced by the searcher. Each approach has merit; a variation
of one or the other may become a new standard in CD-ROM usage. The
MIR project aims to facilitate the advance by suggesting index
structures compatible with either system. By reducing costs so
that many players may use similar structures, and by encouraging
improvements and discussion through interactive publishing, we lay
the groundwork for development of more extensive standards in the
future.
══════════════════════════════
3.5 Beyond fast search
══════════════════════════════
Automated indexing and full text search of a wide
variety of data are, in themselves, immensely worthwhile. The
value of this technology goes much further. It serves as a
foundation for other possibilities. Among them...
» concept search;
» self-indexing hard disk systems for personal computers;
» correlation software;
» automated detection of trends within a company's production
or financial control system;
» records management applications;
» operating systems with indexing power;
» software that learns.
More on these in TUTORIAL FIVE!